Picture for Yichen Zhu

Yichen Zhu

Drive-KD: Multi-Teacher Distillation for VLMs in Autonomous Driving

Add code
Jan 29, 2026
Viaarxiv icon

AutoDriDM: An Explainable Benchmark for Decision-Making of Vision-Language Models in Autonomous Driving

Add code
Jan 21, 2026
Viaarxiv icon

Let Me Show You: Learning by Retrieving from Egocentric Video for Robotic Manipulation

Add code
Nov 07, 2025
Viaarxiv icon

ActiveUMI: Robotic Manipulation with Active Perception from Robot-Free Human Demonstrations

Add code
Oct 02, 2025
Figure 1 for ActiveUMI: Robotic Manipulation with Active Perception from Robot-Free Human Demonstrations
Figure 2 for ActiveUMI: Robotic Manipulation with Active Perception from Robot-Free Human Demonstrations
Figure 3 for ActiveUMI: Robotic Manipulation with Active Perception from Robot-Free Human Demonstrations
Figure 4 for ActiveUMI: Robotic Manipulation with Active Perception from Robot-Free Human Demonstrations
Viaarxiv icon

"Pull or Not to Pull?'': Investigating Moral Biases in Leading Large Language Models Across Ethical Dilemmas

Add code
Aug 10, 2025
Viaarxiv icon

Active Multimodal Distillation for Few-shot Action Recognition

Add code
Jun 16, 2025
Figure 1 for Active Multimodal Distillation for Few-shot Action Recognition
Figure 2 for Active Multimodal Distillation for Few-shot Action Recognition
Figure 3 for Active Multimodal Distillation for Few-shot Action Recognition
Figure 4 for Active Multimodal Distillation for Few-shot Action Recognition
Viaarxiv icon

ChatVLA-2: Vision-Language-Action Model with Open-World Embodied Reasoning from Pretrained Knowledge

Add code
May 29, 2025
Viaarxiv icon

Vision-Language-Action Model with Open-World Embodied Reasoning from Pretrained Knowledge

Add code
May 28, 2025
Viaarxiv icon

WorldEval: World Model as Real-World Robot Policies Evaluator

Add code
May 25, 2025
Viaarxiv icon

PointVLA: Injecting the 3D World into Vision-Language-Action Models

Add code
Mar 10, 2025
Figure 1 for PointVLA: Injecting the 3D World into Vision-Language-Action Models
Figure 2 for PointVLA: Injecting the 3D World into Vision-Language-Action Models
Figure 3 for PointVLA: Injecting the 3D World into Vision-Language-Action Models
Figure 4 for PointVLA: Injecting the 3D World into Vision-Language-Action Models
Viaarxiv icon